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Abstract.  This  paper  describes  the  various  algorithms  and  software  imple- 
mentations of  the  Level- Index  LI  and  Symmetric  Level-Index  SLI  arithmetic 
schemes.  After  a brief  introduction  to  the  munber  representations  and  the 
arithmetic  algorithms,  we  describe  the  original  precompiler  for  including  LI 
and  SLI  variables  cuid  their  arithmetic  in  a Fortran  77  program.  The  Turbo 
Pasceil  unit  for  SLI  arithmetic  includes  several  extended  real  operations  and, 
also,  complex  operations  using  a modulus-argument  representation.  The  For- 
tran 90  implementation  avoids  the  need  for  a precompiler  and  includes  ex- 
tensive testing  of  the  basic  algorithms.  Finally,  we  describe  the  parallel  SIMD 
implementation  of  SLI  arithmetic  on  a MasPar  MP-1  using  a massively  parallel 
version  of  C. 


1.  Introduction 

This  paper  is  concerned  with  the  software  implementations  of  the  level-index  LI 
and  symmetric  level-index  SLI  arithmetic  schemes  which  are  available  for  experi- 
mental computing  on  various  architectures.  Earlier  versions  of  some  of  these  have 
been  described  elsewhere,  and  in  those  cases  only  a summary  of  their  features  is 
included  here.  Before  describing  the  paper  in  more  detail  and  introducing  the  LI 
and  SLI  systems,  we  discuss  very  briefly  some  of  the  motivation  for  this  and  other 
arithmetic  systems. 

One  of  the  primary  drawbacks  of  the  binary  floating-point  system  for  real-number 
representation  and  arithmetic  in  a general  computational  environment  is  its  suscep- 
tibility to  overflow  and  underflow.  Several  suggestions  have  been  made  and  studied 
to  either  alleviate  or  overcome  these  difficulties,  and  to  address  the  related  problem 
of  the  “spacing”  between  successive  representable  numbers  in  the  floating-point  sys- 
tem. In  binary  floating-point  arithmetic  the  difference  between  successive  machine 
numbers  is  a step  function  which  doubles  with  every  unit  increase  in  the  exponent 
value.  The  systems  which  attempt  to  overcome  the  overflow  problem  necessarily 
address  this  problem  too  since  greater  range  within  a given  wordlength  can  only  be 
achieved  by  allowing  this  spacing  to  erode  gradually. 

There  have  been  several  suggestions  for  extended  floating-point  systems  which 
are  mostly  modifications  of  the  tapered  floating-point  proposal  of  Morris  [26].  Pro- 
posals for  eventual  hardware  systems  have  included  Matsui  and  Iri’s  [25]  which 
uses  not  only  variable  lengths  for  its  mantissa  and  exponent  but  also  allows  the 
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possibility  of  extending  to  further  “levels”  once  the  basic  representable  range  is 
exceeded.  However  these  additional  levels  were  not  implemented  in  their  work. 
The  system  introduced  by  Hamada  [15]  and  extended  in  [16]  and  [39]  modified  the 
scheme  of  Matsui  and  Iri  in  using  only  its  level  0 and  changing  the  way  in  which  the 
“pointer”  (the  indicator  as  to  how  many  bits  are  used  by  the  exponent)  is  stored. 
This  pointer  becomes  part  of  the  exponent  information.  A different  modification 
of  the  floating-point  scheme  was  proposed  by  Hull  and  his  coworkers  (see  [17],  for 
example)  using  a scheme  which  allows  variable  range  and  precision  for  its  decimal 
floating-point  arithmetic. 

In  all  of  these  schemes  overflow  remains  a possibility,  although  a much  reduced 
one.  The  step-function  nature  of  the  spacing  between  successive  numbers  remains 
although  the  relative  precision  of  the  representation  is  eroded  monotonically  as  the 
range  increases  for  all  except  for  Hull’s  scheme,  in  which  precision  is  increased  with 
the  allowable  range. 

One  other  approach  to  the  overflow  problem  is  logarithmic  arithmetic.  Algo- 
rithms for  this  have  been  discussed  in,  for  example,  [2]  and  [3],  error  analysis  in 
this  system  is  introduced  in  [4],  and  potential  hardware  designs  are  described  in 
[18,  19]  and  [20].  The  basic  principle  of  the  logarithmic  scheme  is  that  a real  num- 
ber is  represented  by  the  logarithm  of  its  absolute  value  relative  to  some  fixed  base. 
A second  sign  is  used  for  the  sign  of  the  original  real  number.  Thus  A > 0 is 
represented  by  Ex  such  that 

X = 

where  r is  the  fixed  radix  (usually  2)  and  Ex  is  a fixed-point  exponent.  The  number 
of  fractional  bits  in  the  representation  of  this  exponent  dictates  the  (constant) 
relative  spacing  between  successive  numbers  in  this  system. 

The  level-index  LI  system  of  number  representation  and  computer  arithmetic 
was  first  introduced  by  Clenshaw  and  Olver  in  [9].  The  representation  is  based  on 
the  use  of  a generalized  logarithm  function.  In  that  sense  the  system  can  be  viewed 
as  a natural  extension  of  the  ideas  of  the  logarithmic  number  systems.  It  is  also 
a generalization  of  the  Matsui-Iri  representation  in  using  even  more  “levels”  than 
they  envisage  for  extending  the  binary  floating-point  system.  The  details  of  the 
system  and  its  symmetric  counterpart  SLI  are  detailed  in  subsequent  sections  but 
an  introduction  to  the  basic  idea  and  some  of  the  notation  will  be  helpful  here. 

The  representation  of  a large  positive  number  can  be  achieved  by  taking  its 
natural  logarithm  repeatedly  until  a result  is  obtained  in  the  interval  [0,  1).  The 
number  of  times  the  logarithm  has  been  taken  is  called  the  level  while  the  final  value 
in  [0,  1)  is  called  the  index.  Since  the  level  is  an  integer,  no  ambiguity  is  created  by 
representing  the  original  large  number  by  the  sum  of  its  level  and  its  index.  This 
representation  function  is  called  a generalized  logarithm  and  is  denoted  here  by 
Its  inverse  function  is  called  a generalized  exponential  function  and  is  denoted  by 
<j>.  We  refer  to  a generalized  logarithm  and  a generalized  exponential  function  since 
there  are  many  such  functions.  They  have  been  studied  extensively  in  [8]  while 
their  properties  for  computer  arithmetic  have  been  the  subject  of  a series  of  papers 
several  of  which  are  cited  in  the  references.  A good  introduction  to  the  subject  can 
be  found  in  [11]. 

In  the  level-index  scheme,  a real  number  is  represented  by  its  sign  together  with 
the  LI  representation  of  its  absolute  value.  Thus  X is  represented  by  ±x  where  the 
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sign  is  the  sign  of  X and 

(1)  x = ^(|Xl), 

or  equivalently, 


(2)  \X\  = 4>{x). 

The  generalized  exponential  function  most  commonly  used  for  the  LI  arithmetic 
system  is  defined  by 


(3) 


(j>{x) 


X if  0 < a:  < 1, 


The  corresponding  generalized  logarithm  is  given  by 


(4) 


^{X)  = 


X 

l + V’(ln(X)) 


if  0 < < 1, 

if  X > 1. 


Thus,  if  a:  = / + / where  I is  the  integer  part  of  x is  the  LI  representation  of 
X > 0,  it  follows  that 

.f 

X ^(f>{x)^e^''  , a:  = V’(^)  = i + ln(ln(...ln(X))) 


where  the  exponentiation  or  logarithm  is  performed  I times. 

The  SLI  system  [12]  uses  a similar  representation  function  for  “large”  quantities, 
that  is  |X|  > 1.  However  for  “small”  numbers  the  LI  representation  of  its  reciprocal 
is  used;  this  reciprocal  need  not  be  formed,  this  is  just  a convenient  way  to  describe 
the  representation.  Thus  a real  number  X is  represented  in  the  SLI  system  by 

(5)  X = ±<P{xf^ 


where  x > 1.  The  principal  sign  is  the  sign  of  X and  the  reciprocation  sign  is  +1 
if  |X|  > 1 and  —1  otherwise.  It  follows  that  for  the  SLI  representation 


(6)  x=  l + V’(|ln|X||). 

The  basic  properties  of  the  LI  and  SLI  systems  including  representation,  arith- 
metic and  analysis  have  been  extensively  discussed  in  [9,  10,  11,  12,  22],  and  [27]. 
Possible  hardware  algorithms  were  the  subject  of  [28]  and  [33].  Applications  using 
this  system  have  been  detailed  in  [13,  21,  23],  and  [24]. 

The  next  section  of  this  paper  is  devoted  to  a brief  introduction  to  the  SLI 
arithmetic  algorithms.  This  is  followed  in  §3  by  a description  of  the  first  software 
implementation  using  a Fortran  77  precompiler.  This  implementation  uses  an  older 
version  of  the  arithmetic  algorithms  but  has  the  virtue  of  allowing  existing  For- 
tran 77  code  to  be  run  with  just  a change  of  variable  declarations.  In  §4,  a Turbo 
Pascal  implementation  is  described.  This  was  the  first  implementation  to  use  the 
algorithms  described  in  §2.  It  also  has  been  augmented  with  various  extended  real 
operations,  complex  SLI  arithmetic,  and  mixed  operations  for  integer-SLI  arith- 
metic. The  representation  and  arithmetic  adopt  the  internal  wordlengths  obtained 
in  the  various  error  analyses.  The  implementation  described  in  §5  relates  to  devel- 
opment of  Fortran  90  code  which  eliminates  the  need  for  a precompiler.  This  code 
also  allows  the  internal  wordlengths  to  be  varied,  making  experimentation  with  the 
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arithmetic  algorithms  easier.  The  last  implementation  is  written  in  MPL,  a mas- 
sively parallel  version  of  ANSI  C,  for  a MasPar  MP-1  system.  This  takes  advantage 
of  the  parallelism  to  reduce  the  time-penalties  associated  with  any  software  arith- 
metic. It  also  allows  exploration  of  potential  parallelism  in  any  future  hardware 
designs. 

2.  LI  AND  SLI  Arithmetic  Algorithms 

Algorithms  for  the  basic  arithmetic  were  first  presented  in  [10]  and  [12].  In 
the  LI  algorithms,  there  are  special  cases  to  be  treated  when  one  or  more  of  the 
arguments  has  level  0.  Because  these  special  cases  do  not  arise  in  SLI  arithmetic, 
they  are  omitted  from  the  following  description;  the  missing  details  can  be  found 
in  [10].  The  resulting  LI  algorithms  are  simplified,  and  they  form  the  basis  of  the 
fundamental  SLI  arithmetic  operations. 

LI  multiplication  and  division  can  be  achieved  by  taking  logarithms  (or  decre- 
menting the  levels)  of  the  arguments  and  using  the  addition  or  subtraction  algo- 
rithms. We  concentrate  therefore  on  LI  addition  and  subtraction  in  which  we  seek 
the  LI  representation  <i>{z): 

(7)  <t>{z)  = z^x±Y  = 4>{^)±4>{y) 

where  we  shall  assume  that 


® ^ 2/  ^ 1 S’Hd  z >1. 

Denote  the  levels  and  indices  of  these  representations  by 

X = I + f,  y = m-\-  g,  and  z — n + h 


Now,  dividing  (7)  by  the  larger  argument,  we  obtain 


(8) 


1 ± bo. 


The  LI  algorithms  are  based  on  computing  bo  by  using  a (short)  recursive  sequence 
and  then  obtaining  z from  cq  using  another  recursive  sequence.  One  characteriza- 
tion of  this  algorithm  is  that  (f>  (z)  is  being  computed  as  a relative  perturbation  of 
the  larger  operand  4>  (x). 

To  compute  bo,  we  use  a recurrence  relation  for 


(9)  ^ fori  = m-l,m-2,...,0. 

<P(®  - 3) 

The  starting  value  for  this  recurrence  is 


1 — 


9^(1  + ff) 


exp  (ff  - (j){x  - m))  = 


7 if  m = /, 
ifm</ 


<j){x  — m -h  1) 

where  am  = — m)  is  obtained  from  another  recurrence  relation  for 


(10) 


a,  = 


(f>{x-j) 


for  j = / - 1,  / - 2, . . .,  0. 
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The  recurrence  for  this  a-sequence  begins  with  a(_i  = 1/ f)  — e ^ and  is  then 
given  by 

(11)  aj_i  = exp  for  - 1,  / - 2, 1. 

The  corresponding  recurrence  for  the  6-sequence  is 

(12)  6j_i  = exp  ^ ^ ^ for  j = m — 1,  m — 2, . . . , 1. 

Both  of  these  are  easily  verified  using  the  definitions.  For  example,  for  0 < J < /, 

“j-i  - 77 = TaJ 77  = exp{-(f){x  - j))  = exp  ( -— ) . 

(I){x-j  + l)  exp  [<f>{x  - j))  \o.j  J 

It  is  apparent  from  (11)  and  (12)  that  0 < Oj,  bj  < 1 for  every  j.  (Only  bj  can  equal 
1 and  that  only  if  i = y.)  We  observe  that,  from  j — m — 1 onwards,  these  two 
recurrence  relations  can  be  computed  in  parallel  on  a machine  with  even  minimal 
parallelism. 

Before  completing  the  description  of  the  LI  algorithm,  we  discuss  briefly  the 
modifications  that  are  needed  for  SLI  arithmetic.  Full  details  of  these  algorithms 
and  their  analysis  are  included  in  [12].  The  most  important  difference  is  that  one 
or  both  of  the  operands  may  be  “small”,  that  is,  in  reciprocal  form.  In  the  case  of 
“mixed”  arithmetic  we  seek 

(13)  <f>{z)  = Z = X±Y  = (f>{x)±(f>{y)-^ 
and,  dividing  by  the  larger  argument,  we  get 

1 

Co  = 1 ± ,,  s,,  X = 1 ± aoQo 
(f>{x)(f>{y) 

where  cq  = l/<j>{x)  is  computed  as  before  and  ao  = l/<p{y)  is  also  computed  using 
an  a-sequence  but  with  y in  place  of  x. 

In  the  “small”  case,  we  require 

(15)  <j>{z)-^  =Z  = X±Y  = cl>{x)-^±<l>{y)-^ 
and  dividing  by  l/<l>{x)  yields 

(16)  Co  ' - 1 ± ^ = 1 ± 6o  ' = 1 ±/3o 

where  the  quantity  /3o  can  be  computed  by  a recurrence  similar  to,  but  slightly 
more  complex  than,  (12). 

In  all  cases,  we  conclude  the  first  phase  of  the  algorithm  by  computing  either 
Co  or  its  reciprocal.  These  are  essentially  equivalent  for  the  purpose  of  the  SLI 
algorithm.  There  are  exceptional  cases  for  the  remaining  part  of  the  algorithm 
which  are  detailed  in  [12]  but  are  not  discussed  here.  These  relate  to  the  possibility 
that  the  difference  between  two  “large”  numbers  (or  between  a large  one  and  a 
small  one)  may  be  “small”  or  the  sum  of  tv/o  small  numbers  may  be  large.  These 
flipover  cases  can  be  treated  by  minor  variations  of  the  algorithm  described  here. 
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Otherwise  the  large  and  mixed  operations  are  identical  from  this  point.  We 
define  members  of  the  c-sequence  by 


c-  - ~ ^ 

^ - 3 

for  j = 0, 1,  ...,min(n,  /).  Now  the  condition  Cj  < aj  is  equivalent  to  cj){z  - j)  < 1, 
and  so  n = j and  h = <f>{z—j)  = Cj /aj  which  will  complete  the  computation.  There 
is  a simple  recurrence  for  most  of  this  c-sequence  which  is  essentially  the  reversal 
of  the  direction  of  (12): 

(18)  Cj+i  = 1 -H  a^+i  Incj 

for  j not  greater  than  I — 2,  with  the  recursion  terminating  if  the  above  condition 
is  satisfied.  If  necessary,  the  final  member  of  this  sequence  is  replaced  by 


hi  = f -\-  lncj_i 


and  at  most  one  further  logarithm  is  needed  in  the  eventuality  that  hi  > 1.  (One 
additional  logarithm  is  always  sufficient  since  2<f){x)  < exp(^(x))  = (j){x  + 1)  so 
that  a sum  can  never  have  a level  that  exceeds  that  of  the  larger  argument  by  more 
than  1.) 

The  completion  of  the  small  arithmetic  algorithm  is  similar.  The  biggest  differ- 
ence arises  from  the  fact  that  we  do  not  have  cq  but  Cq  It  follows  that  (18)  must 
be  modified  so  that 

C]^  1 ^1  1^ 

after  which  the  rest  of  this  sequence  can  be  computed  using  (18).  The  only  other 
difference  is  that  more  than  one  additional  logarithm  may  be  necessary  in  order  to 
obtain  the  final  /i- value.  (The  difference  of  two  small  numbers  can  be  much  smaller 
than  either  operand  and  so  its  (reciprocal)  level  can  exceed  those  of  the  operands 
by  more  than  1.) 

The  various  quantities  involved  in  (all  the  variants  of)  this  algorithm  are  uni- 
formly bounded  and  can  be  computed  to  fixed  absolute  precisions.  These  working 
precisions  are  examined  in  [10]  and  [12]. 


3.  Fortran  77  Implementation 

An  unpublished  implementation  in  Fortran  77  was  developed  in  the  mid-1980’s 
by  D.  W.  Lozier.  Its  purpose  was  to  provide  a test  vehicle  which  would  allow 
Fortran  programs  to  be  re-interpreted  and  re-executed  in  LI  or  SLI  arithmetic. 
This  was  accomplished  by  embedding  into  the  language  new  data  types  for  LI  and 
SLI  variables,  and  extending  to  these  most  of  the  standard  operations  and  functions 
for  data  of  type  REAL.  This  implementation  was  used,  for  example,  in  [21]  to  solve 
a graphics  problem  that  arose  in  a model  of  turbulent  combustion;  see  [29,  30]  for 
the  physical  and  chemical  details. 
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3.1.  Approach.  A very  important  potential  audience  for  LI  and  SLI  arithmetic 
is  in  scientific  computing.  Particular  emphasis  was  placed  on  Fortran  in  this  imple- 
mentation because  it  seemed  clear  in  the  early  and  mid  1980’s  that  it  was  the  pre- 
dominant language  of  scientific  computing.  Since  then  other  languages  and  systems 
have  gained  importance  but  the  choice  of  Fortran  as  a vehicle  for  an  implementation 
of  LI  and  SLI  arithmetic  is  still  justified. 

The  main  argument  against  Fortran  77  was  that  it  offers  no  direct  support  for  the 
introduction  of  data  types  (this  objection  is  removed  in  Fortran  90;  see  §5  below). 
The  same  objection  had  been  raised  and  overcome  in  the  1970’s  with  respect  to 
other  nonstandard  arithmetic  systems.  For  example,  multiple-precision  systems, 
which  extend  floating-point  arithmetic  to  arbitrary  precision,  are  described  in  [5] 
and  [38].  The  software  described  in  the  first  of  these  references  is  known  as  MP 
(Multiple  Precision),  and  in  the  remaining  reference  as  SP  (Super  Precision).  The 
new  SP  and  MP  data  types  are  introduced  indirectly  by  means  of  a precompiler, 
i.e.  a language  compiler  that  accepts  a Fortran-like  program  as  input  and  generates 
a standard  Fortran  program  as  output.  This  output  is  then  compiled,  linked  and 
executed  just  like  any  other  Fortran  program. 

A precompiler  is  included  with  SP  as  a necessary  part  of  a general  multiple- 
precision  computing  facility.  Originally  MP  had  no  precompiler  but  later  it  was 
linked  to  Augment  [14].  This  linkage  is  described  in  [7].  All  SP  and  MP  operations, 
including  arithmetic,  assignment,  comparison,  function  evaluation,  input,  output 
and  type  conversion,  are  performed  by  ordinary  Fortran  subprograms  of  either 
subroutine  or  function  type.  The  task  of  the  precompiler  is  to  recognize  declarations 
of  nonstandard  variables  and  to  connect  each  operation  that  involves  them  to  the 
appropriate  subprogram.  The  precompilers  operate  by  reading  an  input  file  that 
details  all  of  these  connections,  then  reading  and  translating  the  input  Fortran-like 
program. 

Since  the  precompiler  approach  had  been  successful  for  multiple-precision  arith- 
metic, it  was  decided  the  same  approach  should  be  used  to  develop  a Fortran-like 
LI  and  SLI  facility. 

3.2.  Slitran.  A Fortran-like  language,  identified  in  this  paper  as  Slitran,  was  im- 
plemented using  Augment  [14]  to  provide  three  new  numerical  data  types: 

LEVEL  INDEX 
SYMMETRIC  LEVEL  INDEX 
FLOATING  POINT 

Variables  of  these  new  types  are  carried  in  standard  variables  of  type  DOUBLE 
PRECISION,  and  operations  with  them  are  simulated  by  algorithms  using  double- 
precision floating-point  arithmetic  operations. 

Operators  and  functions  are  provided  for  each  new  type: 

Unary  operators  + and  - 
Arithmetic  operators  +,  -,  *,  /,  and  ** 

Logical  operators  .EQ.,  .NE.,  .GT.,  .GE.,  .LT.,and  .LE. 

Assignment  = 

Absolute  value  and  square  root  functions  ABS  and  SQRT 
Exponential  and  logarithmic  functions  EXP  and  LOG 
Maximum  and  minimum  functions  MAX  and  MIN 
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Where  an  operator  or  function  has  more  than  one  operand  or  argument,  all  must 
be  of  the  same  type.  With  the  obvious  exception  of  the  logical  operators,  all 
the  operators  and  functions,  including  assignments,  produce  a result  of  the  same 
type  as  the  operands.  Explicit  functions  are  provided  for  type  conversion  among  all 
combinations  of  these  types  and  the  three  Fortran  types  INTEGER,  REAL  and  DOUBLE 
PRECISION.  The  remaining  three  Fortran  types,  COMPLEX,  LOGICAL  and  CHARACTER 
are  recognized  and  processed  by  Slitran  but  they  do  not  interact  with  the  new 
types. 

The  new  type  FLOATING  POINT  was  introduced  for  two  reasons.  First,  it  pro- 
vides a simulation  of  IEEE-standard  floating-point  formats  and  arithmetic  opera- 
tions on  non-IEEE  computers.  This  was  felt  to  be  desirable  to  enable  comparisons 
to  be  made  of  new  arithmetic  systems  against  the  best  available  system  for  general 
scientiflc  computing.  It  is  usually  conceded  that  the  IEEE  standard  is  a close-to- 
optimal  speciflcation  of  floating-point  arithmetic,  at  least  for  a 32-bit  binary  format. 
Second,  the  new  type  incorporates  symbols  for  underflow,  overflow,  and  “infinite” 
and  “indefinite”  numbers,  allowing  computation  to  proceed  in  the  face  of  floating- 
point exceptions  in  a manner  analogous  to  computing  with  “Not-a-Numbers”,  or 
NaNs,  in  IEEE  arithmetic.  This  is  important  because  comparisons  will  be  wanted 
between  IEEE  arithmetic  and  the  new  arithmetic  systems  in  problems  where  IEEE 
arithmetic  leads  to  underflow  or  overflow. 

Slitran  usage  is  similar,  up  to  a point,  to  ordinary  Fortran  usage.  Variables 
are  declared  as  in  Fortran,  and  arithmetic  expressions  and  function  invocations  are 
coded  in  the  usual  fashion,  but  input  and  output  of  variables  of  the  new  types 
require  special  coding.  No  special  facilities  other  than  type  conversion  functions 
are  provided  for  input,  since  input  will  almost  always  be  within  the  normal  floating- 
point range.  Input  is  read  into  variables  of  Fortran  type,  then  converted  explicitly 
to  one  of  the  new  types. 

Ordinary  Fortran  output  normally  requires  a WRITE  statement  with  a unit  spec- 
ifier, a format  designator,  and  an  output  list: 

WRITE (unit .format)  expr_l,  expr_2,  ....  expr_n 

A new  statement  is  provided  for  usage  when  new  types  are  among  the  variables  to 
be  output: 

XWRITE(unit)  = expr_l  & expr_2  & . . . & expr_n 

where  each  expression  in  the  output  list  can  have  any  one  of  the  nine  supported 
types.  The  symbol  & is  an  “operator”  defined  such  that  Augment  can  construct  a 
character  string  from  the  output  list,  and  = is  an  “assignment”  that  causes  Augment 
to  write  the  string  to  the  unit.  Technically,  XWRITE  is  a special  form  of  Fortran 
function,  called  a “field  function”,  which  can  be  used  on  the  left  side  of  an  assign- 
ment. In  Fortran,  the  format  designator  either  points  to  or  is  itself  a character 
string.  A different  method,  using  global  parameters,  is  used  in  Slitran. 

There  are  some  67  global  parameters  in  Slitran  that  can  be  used  for  fine  control 
of  its  exact  behavior.  Each  has  an  alphanumeric  name,  represented  by  a charac- 
ter string,  and  a corresponding  integer  value.  They  are  manipulated  with  a field 
function  named  SYS  (for  system).  For  example,  let  I be  a variable  of  type  integer. 
Then 


I = SYS('SWIDTH') 
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sets  I equal  to  the  current  integer  value  of  SWIDTH  (the  output  field  width  in  char- 
acters for  variables  of  type  SYMMETRIC  LEVEL  INDEX),  and 

SYS ( ’SWIDTH')  = I 

changes  the  current  value  of  SWIDTH  to  the  value  of  I.  Statements  like 
SYS (’SWIDTH’)  = SYS (’SWIDTH’)  + 10 

are  valid;  this  one  increases  the  current  value  of  SWIDTH  by  10. 

The  global  parameters  associated  with  output  operations  specify  the  field  width, 
the  number  of  decimals  after  the  decimal  point,  and,  in  the  case  of  floating-point 
types,  the  number  of  decimals  in  the  exponent.  All  have  default  values,  e.g.,  16, 
8 and  2.  Numbers  are  right-justified  in  the  output  field,  and  the  field  is  blank- 
filled  on  the  left.  Numbers  of  type  REAL,  DOUBLE  PRECISION,  or  FLOATING  POINT 
are  expressed  in  Fortran  E format  with  one  digit  before  the  decimal  point.  When 
a number  of  type  FLOATING  POINT  is  generated  that  is  outside  the  representable 
range,  Slitran  classifies  it  as  underflow,  overflow  or  the  result  of  an  indefinite  op- 
eration (such  as  division  by  zero).  In  these  cases,  an  appropriate  alphabetic  string 
is  placed  in  the  output  field.  Numbers  of  type  LEVEL  INDEX  or  SYMMETRIC  LEVEL 
INDEX  are  expressed  in  Fortran  F format  with  one  digit  in  the  integer  part  to  rep- 
resent the  level.  The  numbers  are  enclosed  in  square  brackets,  with  the  sign  of  the 
number  in  front  of  the  opening  bracket.  The  reciprocation  indicator  in  the  case  of 
SLI  variables  is  placed  behind  the  opening  bracket. 

The  direct  output  of  LI  and  SLI  variables  using  XWRITE  displays  the  level  and 
index  in  decimal.  Usually  output  will  be  wanted  in  a more  familiar  format.  The 
type  conversion  function  FLP  is  useful  in  this  regard.  It  takes  an  argument  of  any 
type  and  converts  it  to  type  FLOATING  POINT.  If  a number  underflows  or  overflows, 
a special  bit  pattern  is  stored,  and  this  is  recognized  during  output  processing. 

The  internal  format  of  the  new  types  is  also  subject  to  control  by  global  param- 
eters. The  default  wordlength  is  32  bits,  and  the  default  floating-point  parameters 
are  those  of  the  IEEE  standard.  Because  these  three  types  are  carried  internally  in 
variables  of  type  DOUBLE  PRECISION,  the  new  types  cannot  be  simulated  to  more 
than  48  bits  or  so. 

Three  modes  of  abbreviation  are  supported  for  the  new  types:  rounding,  chopping 
and  unabbreviated.  Rounding  is  done  by  adding  a half-unit  in  the  last  bit  position 
of  the  significand  or  index,  then  truncating  and  storing  the  result.  Chopping  is 
done  by  simply  truncating  and  storing  the  result.  The  unabbreviated  mode  does 
not  modify  the  double-precision  result  before  storing  it.  Rounding  is  the  default 
mode;  the  other  modes  can  be  selected  by  setting  the  appropriate  global  parameter. 
For  example,  coding  like 

MODE  = SYS(’ABBREV’) 

SYS(’ABBREV’)  = SYS( ’UNABBR’ ) 

code  to  be  executed  to  full  available  precision 
SYS(’ABBREV’)  = MODE 

can  be  useful  in  computations  that  require  guard  digits. 

As  was  seen  in  §1,  the  generalized  exponential  function  is  determined  by  its 
deflnition  on  the  unit  interval.  Three  different  choices  may  be  selected  by  setting 
the  appropriate  global  parameter.  The  simplest  deflnition,  and  this  is  the  default, 
arises  from  the  identity  function;  see  eq.  (3).  It  is  continuous  and  continuously 
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difTerentiable  but  higher  derivatives  are  discontinuous.  The  alternative  choices  are 
much  smoother.  They  were  provided  for  experimentation  with  alternative  SLI 
arithmetic  algorithms  based  on  surface  fitting. 

The  remaining  global  parameters  will  not  be  described  here.  Most  are  used  to 
control  the  interception  and  processing  of  error  conditions. 

3.3.  Algorithms.  The  algorithms  used  to  perform  arithmetic  operations  in  Sli- 
tran  were  a precursor  to  the  ones  described  in  §2.  They  are  less  efficient  in  that  they 
employ  a doubly  recursive  computing  process.  They  are  described  here  as  a neces- 
sary part  of  the  description  of  Slitran,  and  also  because  they  afford  an  independent 
check  on  the  newer  algorithms. 

We  begin  with  LI  subtraction.  It  is  sufficient  to  consider  the  equation 

(19)  (l>{z)  - (j){x)  - (f){y)  {x  >y  > 0). 

As  in  §2,  we  use  the  notation 

(20)  X = I + f,  y = m + g,  z = n + h 

where  l,m  and  n are  the  levels,  respectively,  of  (f)[x),(f>[y)  and  ^(2).  Our  task  is  to 
compute  2,  i.e.  n and  h,  given  and  g.  Toward  this  end,  we  introduce  the 

further  equations 

(21)  4>{zi)  - - 4>{yi)  (i  = 0,  l,...,m) 

where 

(22)  Xi  = I - m-\-  i-\-  f,  yi-i  + g,  Zi-ni-\-hi. 

An  outer  recursion  generates  the  sequence  20, 21, . . . , 2^  = 2.  The  ith  term  of  this 
sequence  satisfies  the  equation 

(23) 

where  am-i  = \/^{xi)  is  a term  of  the  a-sequence  defined  in  eq.  (10)  and 

(24)  = I - <t>{yi)/4>{xi). 

Now,  if  Cq  ^ < am-i,  then  = 0 and  hi  = / am-i-  Otherwise,  logarithms  of 
(j){zi)  must  be  computed  repeatedly  until  the  result  is  in  the  interval  [0, 1).  This 
constitutes  the  inner  recursion.  Assuming  that  we  have 

(25)  4>{zi)  = ! am-iJrj , 

then  we  can  write 

(26)  ln'+'  = c<(>, 

where 

(27)  = 1 + am-i+j+i  In  . 

The  inner  process  ends  with  Ui  - j and  hi  = cP/am-i+j  when  cP  < Cm-i+j  . The 
c-sequence  and  c-recurrence  of  this  section  are  analogous  to  the  ones  found  in  §2; 
cf.  eqs.  (17)  and  (18). 
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The  outer  recursion  hinges  on  the  determination  of  the  starting  value  (24)  for 
the  c-sequence.  When  i = 0,  we  have 

(0)  1 

= 1 - amQ- 

When  z > 1,  we  can  write 

_ g(^(yi-l)-<^(ii-l)  _ g-0(2i_i)  _ \ . 

4>{Xi)  <f){l  + 

cf.  eq.  (21).  The  rightmost  member  of  this  equation  is  computed  as  an  a-sequence 
with  index  /ii_i  in  place  of  /;  cf.  eqs.  (10)  and  (11).  Then  it  is  substituted  into 
eq.  (24). 

Turning  now  to  LI  addition,  we  continue  with  the  assumption  that  x > y > 0. 
Define 

In^  <l){z)  - <f>{x  - j)  + dj  {j  -0,1,...,  1) 

Then  either  n = I and  h — f+di  (if  f+di  < 1),  or  n = /+!  and  h — ln{f+di).  When 
I — m = 0,  do  = g.  Otherwise,  the  d-sequence  is  computed  from  the  recurrence 


dr  = ln(l  + aj_idj_i) 


starting  from 


where 


di 


ln(l  + aog)  if  m = 0, 

ln(l  + l/(/)(t  + 1))  if  m > 0 


4>{t  + 1)  = = g0(x-l)-0(y-l)_ 


The  subtraction  procedure  is  called  to  compute  t,  and  the  term  l/(j>{t  + 1)  is  com- 
puted by  an  a-sequence  with  the  appropriate  index. 

Except  for  special  cases  when  the  levels  of  one  or  both  operands  are  zero,  LI 
multiplication  and  division  rely  on  the  identities 


^ g^(x-i)-0(y-i)^  (t>{x)<l){y)  - 


the  right  sides  of  which  are  easily  obtained  with  the  aid  of  the  LI  addition  and 
subtraction  procedures. 

The  SLI  representation  is  slightly  different  in  Slitran  than  in  other  implementa- 
tions. An  explicit  reciprocation  bit,  as  in  eq.  (5),  is  not  used.  Instead,  a positive 
number  X is  represented  by  the  signed  number 


(28) 


E = 'J'(X)  = 


^>(33)  — 1 

1-V'(1/X) 


if  X > 1, 
if  0 < X < 1 


where  V’  is  the  LI  mapping  (4).  The  inverse,  in  terms  of  the  inverse  LI  mapping 
(3),  is 

= |<»(1  + -) 

1 1/^(1  — a:)  if  X < 0. 


The  identity 


$(i)$(— x)  = 1 
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shows  that  changing  the  sign  of  an  internal  number  corresponds  to  reciprocating 
the  (positive)  external  number  that  it  represents.  The  actual  sign  of  the  external 
number  is  carried  separately,  as  it  is  in  the  other  SLI  implementations  discussed  in 
this  paper. 

It  is  worth  noting  that  two  signs  are  used  in  the  definition  of  fioating-point 
numbers.  The  familiar  device  of  biasing  (adding  a constant  to)  the  exponent,  so  as 
to  avoid  having  to  represent  an  explicit  sign,  is  available  also  in  SLI  representation, 
and  indeed  it  is  used  (with  a bias  constant  of  8)  in  Slitran. 

The  efficiency  of  Slitran  suffers  in  comparison  to  the  other  implementations  be- 
cause (i)  SLI  operations  are  done  indirectly  by  calling  on  LI  operations;  (ii)  LI 
operations  do  not  take  advantage  of  later  algorithmic  developments,  such  as  the 
singly  recursive  algorithms  for  addition  and  subtraction;  and  (iii)  global  parameter 
control  adds  to  the  total  run  time.  However,  it  is  a close  parallel  to  conventional 
Fortran  and  it  offers  the  user  considerable  control  over  the  algorithmic  details  of 
the  simulated  computer  arithmetic.  Furthermore,  the  inefficiencies  are  not  inherent; 
they  could  be  reduced  by  re-coding. 

4.  Turbo  Pascal  Implementation 

The  original  version  of  this  implementation  of  SLI  arithmetic  was  described  in 
[34].  Since  then  it  has  undergone  a number  of  modifications,  improvements  and 
extensions.  This  Turbo  Pascal^  implementation  was  developed  for  experimental 
computation  on  personal  computers.  It  uses  the  arithmetic  algorithms  described 
in  §2  which  avoid  the  doubly  recursive  aspect  of  the  previous  implementation.  The 
algorithms  for  SLI  arithmetic  are  based  directly  on  the  definition  of  that  represen- 
tation rather  than  using  combinations  of  LI  operations  and  the  rules  of  algebra. 

The  SLI  representation  is  mapped  into  a conventional  binary  integer  represen- 
tation in  an  order-preserving  manner.  Indeed,  the  Turbo  Pascal  type  slisingle  is 
identified  with  the  32-bit  type  longint.  The  machine  representation  consists  of  the 
appropriate  binary  encoding  of  the  two  signs  and  of  x which,  as  before,  has  an  inte- 
ger part,  the  levels  of  three  bits.  The  order-preserving  nature  of  this  representation 
is  achieved  by  using  a ones  complement  form  for  negatives,  and  complementing  the 
level  and  index  of  quantities  in  reciprocal  form.  The  packing  algorithm  is  detailed 
in  [34] 

For  this  implementation,  the  precisions  forecast  by  the  error  analyses  of  [10]  and 
[12]  are  the  precisions  used  in  the  internal  computation.  The  built-in  exponential 
and  logarithmic  functions  are  used  internally.  Proposals  have  been  made  for  other 
internal  algorithms  based  on  table  look-up  or  modified  CORDIC  algorithms  [28,  33] 
but  that  is  not  the  topic  under  present  discussion. 

The  following  subsections  discuss  some  of  the  extensions  that  have  been  incor- 
porated into  the  Turbo  Pascal  SLIUNIT.  These  include  implementations  of  algo- 
rithms for  mixed  integer-SLI  arithmetic  using  the  integer  exactly;  computation  of 
the  elementary  functions;  extended  arithmetic  operations  such  as  summation,  scalar 
products,  vector  norms  and  polynomial  evaluation;  and  complex  SLI  arithmetic  us- 
ing the  polar  representation  of  complex  numbers.  All  of  these  work  directly  with 
the  SLI  representations  of  the  various  arguments.  Some  of  these  are  described  or 


^ Turbo  Pascal  is  a trademark  of  Borland  International,  Inc. 
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outlined  in  [34,  35,  36,  37]  although  many  of  those  descriptions  have  since  been 
improved  upon.  We  simply  summarize  some  of  these  features  here. 

The  extended  sums  and  related  operations  are  performed  using  algorithms  which 
exhibit  a natural  parallelism,  and  they  reduce  the  normal  linear  time-penalty  which 
is  expected  for  serial  computation.  Some  of  these  ideas  re-emerge  in  the  parallel 
implementations  — especially  the  massively  parallel  C implementation  described 
in  §6. 


4.1.  Mixed  Integer-SLI  Arithmetic.  One  of  the  benefits  of  the  mixed  algo- 
rithms lies  in  the  fact  that  integers  are  used  exactly.  (Like  any  computing  envi- 
ronment, integer  variables  must  be  stored,  represented  and  operated  upon  without 
incurring  error.)  Apart  from  the  basic  arithmetic  operations  of  integer-SLI  ad- 
dition, subtraction,  multiplication  and  division,  the  integer  power  function  and 
integer-roots  of  any  order  are  easily  incorporated  into  the  SLI  arithmetic  frame- 
work. 

We  illustrate  this  by  considering  the  operation  of  forming  integer  powers  of  SLI 
variables.  This  has  further  application  in  the  evaluation  of  polynomial  functions 
using  a technique  similar  to  that  employed  for  the  extended  arithmetic  operations 
below. 

We  thus  require  z and  the  associated  signs  such  that 
Z = ±(l>{z)^^  = {±(i>{x)^^)^  = 
where  N is  an  integer.  The  two  signs  are  easily  resolved: 

Z < 0 iff  (X  < 0)  and  {N  mod  2 = 1), 

|Z|  >1  iff  (W  = 0)  or  (|X|  > 1,  JV  > 0)  or  (|X|  < 1,  N < 0). 

The  problem  thus  reduces  to  forming  positive  integer  powers  of  positive  quanti- 
ties greater  than  unity: 

(t>{z)  = 4>{x)^ 

with  W > 0.  Taking  natural  logarithms,  we  obtain 

(j){z  — 1)  = N(f){x  — 1) 


from  which  it  follows  that 


(29) 


Cl 


(t>{z  - 1) 

(f){x  - 1) 


= N. 


The  algorithm  is  completed  by  generating  the  c-sequence  as  was  done  in  §2;  cf.  eqs. 
(17)  and  (18). 

The  modifications  for  the  other  mixed  operations  are  mostly  even  simpler  than 
this.  They  all  share  the  property  that  the  integer  is  used  exactly  rather  than  being 
first  converted  (promoted)  to  its  SLI  representation.  Observe  that  this  property  is 
not  shared  by  other  arithmetic  systems  - except  perhaps  for  this  power  operation 
if  repeated  multiplication  is  used. 
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4.2.  SLI  Algorithms  for  Elementary  Functions.  Algorithms  for  some  of  the 
elementary  functions  are  necessarily  simpler  in  SLI  than  in  other  arithmetic  formats. 
Others,  of  course,  are  more  complicated.  Among  the  ones  which  are  simplified,  not 
surprisingly,  are  the  natural  logarithmic  and  exponential  functions.  By  definition. 

In  [(f>{x)^^)  = ±<f>{x  - 1) 

and  the  only  complication  arises  when  x < 2 so  that  <j){x  — 1)  must  be  converted 
back  from  its  fixed-point  fraction  to  its  SLI  representation  by  forming  (perhaps 
repeated)  logarithms  of  this  result.  Thus,  for  example,  for  ^(x)  = 2 = (^(1  -|-  In  2) 
we  get 

ln(<^(x))  = ln2  .=  = ^(i  - lnln2)-'  = <^(1.3665...)-^ 

Functions  such  as  absolute  value,  reciprocation,  and  negation  are  very  simple 
bitwise  operations  on  the  slisingle  representation  of  the  variable.  Indeed,  all 
of  these  are  implicit  in  the  sign  determination  process  for  the  basic  arithmetic 
algorithms. 

From  the  integer  power  operation  above,  it  should  be  apparent  that  evaluation 
of  monomial  terms  is  also  straightforward.  Taking  logarithms  as  in  (29), 

(f>{z)  - (f>{y)  * (f){x)^ 


yields 


(30) 


Cl 


<t>{z  - 1) 

<f>{x  - 1) 


= N + 


Hy- 1) 

<j){x  - 1) 


= N + bi 


This  can  be  made  into  an  efficient  algorithm  for  the  evaluation  of  general  polyno- 
mials by  using  the  extended  operations  described  later. 

The  other  elementary  functions  which  are  incorporated  into  the  Turbo  Pascal 
implementation  are  the  basic  trigonometric  functions.  The  only  ones  built  into 
Turbo  Pascal  itself  are  sin,  cos  and  arctan.  The  SLIUNIT  is  restricted  to  these 
same  functions.  The  first  two  make  no  direct  use  of  the  SLI  representation  of  their 
arguments,  but  use  instead  conversion  between  floating-point  and  SLI  representa- 
tions. One  reason  for  this  choice  is  that  for  sufficiently  large  arguments,  the  real 
interval  represented  by  a specific  floating-point  or  SLI  number  covers  more  than 
a period  of  the  function.  There  is,  therefore,  nothing  to  be  gained  by  trying  to 
evaluate  these  functions  to  high  accuracy  for  SLI  numbers  outside  the  range  of  the 
floating-point  system. 

By  contrast,  the  arctangent  function  does  have  a legitimate  domain  over  the 
whole  real  line.  By  using  the  identities 


(31) 

(32) 


. TT  1 

arctan  |A  | = arctan  -— 

2 |a| 

arctan(— X)  = — arctan(X), 


J 


the  SLI  arctangent  algorithm  is  reduced  to  evaluation  of  arctan  (ao)  where,  as  usual, 
ao  = l/</'(a:). 
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4.3.  Complex  SLI  Arithmetic.  Complex  SLI  arithmetic  is  incorporated  into 
the  Turbo  Pascal  SLIUNIT  using  the  polar  representation 

(33)  Z = 

where  the  modulus  is  represented  in  standard  SLI  form  and  the  argument  6 is  stored 
as  a fixed-point  fraction  in  [—1, 1)  of  tt.  The  two  parts  can  be  packed  conveniently 
into  the  64-bit  integer  Turbo  Pascal  type  comp  to  facilitate  their  use  in  “in-line” 
function  and  arithmetic  calls. 

The  complex  SLI  arithmetic  algorithms  can  be  achieved  without  the  need  to 
convert  to  real  and  imaginary  parts,  and  to  make  multiple  calls  to  the  unde.^ying 
real  algorithms,  by  using  the  cosine  rule  for  the  appropriate  “triangle”  in  the  com- 
plex plane.  The  algorithm  for  “large”  addition  proceeds  much  as  for  real  arithmetic 
using 


(34) 


Cq  — 1 “h  6q  “1“  26o  cos  B 


and  then 


(35) 


Cl  = 1+  -ao  In  (cq) 


where  6 is  the  angle  between  the  two  “position  vectors”.  Of  course,  since  bo  is 
computed  by  evaluating  the  exponential  function,  6q  can  be  computed  with  no  ad- 
ditional difficulty.  The  computation  of  the  modulus  of  the  result  can  be  completed 
using  the  usual  SLI  algorithm.  Modifications  for  the  “mixed”  and  “small”  cases 
are  similar. 

Similarly,  it  turns  out  that  A6,  the  difference  between  the  argument  of  the  result 
and  that  of  the  larger  operand  Bq,  is  obtainable  from 


(36) 


AB  = arctan 


bo  sin  Bo 
1 -h  bo  cos  Bo 


The  derivation  of  these  equations  (34),  (35)  and  (36)  are  given  in  [37],  which  also 
includes  an  error  analysis  and  observations  on  the  implementation  of  the  special 
steps  which  are  required  by  these  equations.  The  important  finding  is  that  it 
remains  true  that  fixed-point  internal  computation  is  sufficient,  with  similar  working 
precisions  to  those  required  for  the  conventional  real  SLI  algorithms. 

Of  course,  the  use  of  the  polar  form  for  complex  arithmetic  has  the  effect  of 
making  complex  multiplication  equivalent  to  a single  real  multiplication  for  the 
modulus  and  a fixed-point  addition  for  the  argument.  This  compares  with  the  usual 
six  real  operations  for  a conventional  complex  product.  There  is  no  compensating 
additional  cost  since  the  algorithm  just  outlined  is  also  cheaper  than  two  real  SLI 
additions. 


4.4.  Extended  SLI  Operations.  The  potential  for  parallelism  in  the  SLI  ex- 
tended operation  algorithms  is  discussed  in  [23],  and  in  [36]  in  which  the  algorithm 
for  extended  summation  is  also  detailed  and  analyzed.  We  content  ourselves  here 
with  an  outline  of  the  algorithm  for  extended  summation  and  a description  of  its 
uses  in  the  Turbo  Pascal  implementation  to  compute  scalar  products  and  vector 


norms. 
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The  basic  problem  of  extended  SLI  summation  is  to  find 

N 

1=0 

where  we  shall  assume  that  Xq  - ±<f>{xo)^^  is  the  largest  magnitude  term.  (Since 
the  binary  representation  is  consistent  with  integer-order,  identifying  this  largest 
argument  is  a simple  operation.)  For  the  (usual)  case  where  |Xo|  > 1,  the  algorithm 
can  proceed  as  above  after  the  computation  of 

Co  = 1 + 51  ^*^0  ^ X]  ®*“o“o  ^ 

7-i  = -(-l  ri  = -l 

where  Sj,  represent  the  sign  and  reciprocation  sign  of  Xi  and  = <f){xi)/^{xo), 
ao  = l/(^(xo),  = l/<f)(xi).  It  is  apparent  that  only  one  o-sequence  and  one 

c-sequence  are  needed,  together  with  either  an  a-  or  a fe-sequence  for  each  of  the 
other  terms.  This  represents  a saving  of  about  67%  of  the  work  that  would  be 
needed  for  the  usual  serial  summation  of  the  same  terms.  For  the  case  where  all 
terms  are  “small”  there  is  a similar  modification. 

The  formation  of  scalar  products  can  now  be  achieved  with  a two-stage  pro- 
cess. The  first  is  the  elementwise  multiplication  of  the  two  SLI-vectors  which  is 
followed  by  this  extended  summation.  This,  like  the  basic  summation  operation,  is 
incorporated  into  the  Turbo  Pascal  implementation  in  just  this  way. 

Similarly,  the  usual  vector  norms  could  be  computed  by  first  forming  the  appro- 
priate powers  of  each  element,  then  using  this  summation  algorithm,  and  finally 
taking  the  appropriate  “root”  of  the  result.  However,  this  can  be  (and  is)  improved 
further  by  observing  that  the  largest  term  will  also  have  the  largest  p-th  power. 
Thus,  the  definition  of  cq  is  adjusted  so  that 

and  then  its  p-th  root  is  formed  within  the  first  logarithm  of  the  c-sequence.  The 
details  are  not  important  here.  The  point  is  that  it  illustrates  both  the  versatility 
of  the  SLI  algorithm  for  modification  to  more  complicated  computation  and  its 
suitability  to  parallel  implementation. 

5.  FORTRAN  90  Implementation 

An  implementation  in  Fortran  90  was  published  in  the  1993  Ph.  D.  thesis  [31]  of 
I.  Reid.  Its  purpose  coincides,  at  least  in  part,  with  that  of  the  earlier  Fortran  77  im- 
plementation of  D.  W.  Lozier;  cf.  §3  of  this  paper.  Both  provide  for  re-interpreting 
and  re-executing  Fortran  programs  in  SLI  arithmetic  through  the  introduction  of 
a new  data  type.  The  later  implementation  does  not  support  level-index  variables 
and  operations,  but  it  uses  the  efficient,  singly  recursive  algorithms  of  §2.  Efficiency 
is  improved  further  when  an  operand  is  of  type  INTEGER  by  modifications  that  use 
the  integer  representation  directly,  as  was  done  in  the  Turbo  Pascal  implementa- 
tion. Special  emphasis  is  placed  on  simulating  the  exact  bit-precision  that  is  called 
for  in  theory  in  the  fixed-point  generation  of  the  a-,  b-,  c-  and  related  sequences, 
thereby  providing  for  the  possibility  of  justifying  the  theory  experimentally.  The 
thesis  includes  a test  program  that  was  used  for  this  purpose. 
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As  was  noted  in  §3,  Fortran  90  includes  among  its  improvements  over  Fortran 
77  the  capability  to  introduce  new  data  types,  extend  to  them  standard  Fortran 
operators  and  functions,  and  supply  for  them  additional  operators  and  functions. 
Accordingly,  there  is  no  need  for  an  approach  using  a precompiler. 

5.1.  Structure.  Variables  of  the  new  data  type  SLI  occupy  two  locations  of  type 
LOGICAL  (for  the  sign  and  reciprocation  bits),  a location  of  type  INTEGER  (for  the 
level),  and  a location  of  type  DOUBLE  PRECISION  (for  the  index).  This  internal 
organization  is  of  no  concern  to  the  user,  as  the  declaration 
TYPE(SLI)  var_l,  var_2,  ...,  var_n 
suffices  to  identify  variables  of  the  new  type. 

Operators  and  functions  are  provided: 

Unary  minus  - 

Arithmetic  operators  +,  -,  /,  *,  **  (with  implicit  type  conversion) 

Logical  operators  ==,  /=,  >,  >=,  <,  <= 

Assignment  = (with  implicit  type  conversion) 

Absolute  value  and  integer  root  functions  SLIABS  and  INTJIOOT 
Exponential  and  logarithmic  functions  SLIEXP  and  SLILN 
Generalized  distance  GD 

Reciprocal  and  set-reciprocal-bit  functions  RECIP  and  RECIP_TRUE 
With  the  exception  of  the  logical  operators  and  the  generalized  distance,  the  result 
of  all  operators  and  functions  is  of  type  SLI.  Arithmetic  operators  and  assignment 
support  implicit  type  conversion,  i.e.  an  operand  can  be  of  type  INTEGER,  REAL  or 
DOUBLE  PRECISION.  In  all  other  cases,  the  operands  must  be  of  type  SLI.  Logical 
operators  are  supported  only  in  the  new-style  Fortran  90  notation;  the  old-style 
. EQ . , etc.,  cannot  be  used  with  operands  of  type  SLI.  The  set-reciprocal-bit  function 
forces  the  reciprocation  sign  to  be  -|-1. 

The  generalized  distance  was  introduced  to  facilitate  comparisons  in  the  test 
program.  A problem  with  any  computer  arithmetic  system  is  how  to  measure  the 
difference  between  computed  results.  In  an  LI  system,  it  is  customary  to  use  the 
norm 

\\X-Y\\  = \^{X)-^{Y)\ 

where  the  generalized  logarithm  (4)  is  extended  to  the  negative  real  axis  by  odd 
symmetry.  In  an  SLI  system,  we  cannot  just  substitute  the  SLI-mapping  (28) 
because  of  the  singularity  at  the  origin.  The  function 

gd(x,y)  = |^(x)-^(y)| 

is  a meaningful  measure  if  and  only  if  X and  Y are  of  like  sign.  Therefore,  Reid 
introduced  the  expanded  definition 

r37l  edrx  yj  = if  signs  same, 

’ I |^(|X|) -|- ^(|y  I) -|- 14|  otherwise 

together  with  the  convention  that  ^(0)  = —7.  The  constants  7 and  14  arise  because 
of  the  maximum  level  that  is  allowed  in  Reid’s  implementation.  In  effect,  Reid 
replaces  the  SLI  representation  of  zero  (which  would  most  naturally  be  defined  as 
zero)  with  the  smallest  representable  SLI  number.  The  expanded  definition  allows 
small  numbers  of  opposite  sign  to  be  “close”  to  each  other. 
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5.2.  Testing.  The  expanded  gd  (generalized  distance)  function  (37)  is  used  to 
compare  a result  computed  in  SLI  arithmetic  against  a more  accurate  value  com- 
puted in  higher  precision.  The  48-bit  SLI  format  allocates  43  bits  to  the  index. 
Accordingly,  with  70  denoting  “machine  epsilon” , we  have 

.V  — 9-43 
70  — 4 

for  abbreviated  SLI  arithmetic.  The  comparison  values  are  computed  in  double 
precision  wherever  possible,  and  where  not  possible  in  unabbreviated  SLI.  In  either 
case  the  machine  epsilon  7 is  the  same, 

7 = 


for  IEEE  arithmetic. 

The  test  program  applies  to  the  binary  operations  +,  -,  *,  / and  **.  That  is,  a 
function 

Z = f{X,Y) 

is  being  tested,  where 

X ^ iHix),  Y = ^(y),  Z = ^(2). 

Now  fix  attention  on  a pair  of  operands  x and  y.  Assume  these  are  stored  without 
error  in  double  precision  in  the  test  program.  The  fractional  parts  are  truncated  to 
43  bits,  and  the  resulting  numbers  i,  y are  stored  in  SLI  format.  Then  the  result 
and  generalized  distance 

i==/($(x),$(y)),  d = gd{Z,Z) 

are  computed.  For  + and  *,  d < 70  is  used  as  the  acceptance  criterion  for  accurate 
results. 

This  criterion  is  too  severe  for  other  operations  because  their  algorithms  involve 
subtraction  with  the  attendant  possibility  of  heavy,  significance-losing  cancellation. 
Therefore,  the  inherent  error  due  to  a last-bit  perturbation^  of  the  operands  is 
estimated  by  computing 

gd{Z,Zi)  + gd{Z,Z2) 

2 

where 

Zr  = f ($(x  + 70/2),  $(y))),  Z2  = f ($(x),  $(y  + 70/2))). 

Then  the  acceptance  criterion  becomes  d < d.  These  computations  are  done  in 
double  precision  or  unabbreviated  SLI,  whichever  is  appropriate. 

We  have  introduced  the  “machine  epsilons”  70  and  7 corresponding  to  SLI  arith- 
metic with  43-bit  index  and  “unabbreviated”  SLI  arithmetic  on  an  IEEE  computer. 
Linearized  perturbation  theory  has  been  applied  to  determine  the  required  machine 
epsilons  for  the  o-,  b-,  c-  and  related  sequences.  After  a review,  Reid  concludes  that 
the  choices 

To  — 4 , 7l  — 4 , 72  — 4 

^Although  the  natural  perturbation  would  appear  to  be  70 1 Reid  uses  half  this  value  in  his 
thesis. 
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are  compatible  with  the  theory,  where  the  smallest  machine  epsilon,  72,  is  used  for 
o-sequences  and  71  is  used  for  all  other  sequences.  His  results  of  extensive  testing 
support  the  adequacy  of  these  choices. 

But  Reid  also  observes  that  it  would  be  advantageous  in  a hardware  implemen- 
tation to  be  able  to  reduce  the  wordlength  implied  by  72.  Such  a reduction  for 
72  and  even  for  71  might  be  possible,  since  the  linearized  theory  does  not  produce 
sharp  bounds.  Accordingly,  he  repeated  the  tests  with 

72-71,  7le{2-^2-'‘^2-^^}. 

The  tests  passed  the  acceptance  criteria  for  the  two  smaller  values  of  71,  but  not 
when  the  largest  value  was  used. 

6.  Massively  Parallel  C Implementation 

The  most  recent  implementation  of  SLI  arithmetic  is  as  a part  of  the  Computer 
Arithmetic  Laboratory  being  developed  for  the  MasPar  MP-1  system.  This  project 
has  been  outlined  in  [1].  The  architecture  of  this  machine  is  a rectangular  SIMD 
array  of  processors  with  nearest  neighbor  connections  and  toroidal  wraparound. 
The  particular  system  being  used  has  4096  processors  in  a 64x64  array.  There  are 
two  languages  available  on  this  system:  a variant  of  Fortran  90  with  High  Perfor- 
mance Fortran  extensions,  and  a parallel  extension  of  ANSI  C.  The  implementation 
of  SLI  arithmetic  uses  the  latter.  In  this  section,  we  begin  with  a brief  description 
of  the  computer  system  and  its  suitability  for  this  purpose.  This  is  followed  by  a 
description  of  the  SLI  arithmetic  algorithms  used  which  are  further  modifications  of 
those  employed  by  the  earlier  implementations.  The  modified  algorithm  possesses 
a greater  degree  of  natural  parallelism,  especially  for  extended  operations. 

6.1.  The  MasPar  MP-1  System.  As  is  stated  above,  the  system  being  used 
is  a 64x64  SIMD  array.  The  individual  processors  are  just  4-bit  processors  so 
that  all  the  built-in  arithmetic  is  implemented  in  microcode.  The  power  of  the 
system  is  derived  from  the  massive  parallelism  that  is  available  for  appropriate 
computations.  Like  all  SIMD  architectures,  at  any  point  in  a program  all  processors 
are  either  performing  the  same  instruction  (on  their  individual  data)  or  are  inactive. 
The  advantages  of  using  such  a system  for  implementing  experimental  arithmetic 
systems  arise  out  of  its  flexibility. 

For  example,  the  arithmetic  can  be  implemented  in  serial  in  such  a way  that 
the  computation  is  spread  across  the  processor  array.  This  allows  the  computation 
to  take  advantage  of  the  parallelism  to  reduce  the  time-penalty  which  is  otherwise 
incurred  by  a software  implementation.  By  implementing  floating-point  arithmetic 
in  a similar  manner,  reasonably  fair  comparisons  between  the  execution  times  of 
the  two  systems  can  be  made. 

The  other  great  advantage  is  that  minor  adjustments  in  the  algorithms  can  be 
implemented  with  relative  ease.  This  alleviates  the  need  for  building  experimental 
hardware  until  after  extensive  experimentation  has  been  performed.  In  a similar 
manner,  the  area-speed  trade-off  can  be  examined  by  restricting  the  amount  of 
parallelism  allowed  in  a particular  implementation. 

The  language  being  used  is  MPL  [Massively  Parallel  Language)  which  is  a paral- 
lel extension  of  C.  The  primary  augmentation  of  ANSI  C comes  form  the  inclusion 
of  •plural  variables  of  all  the  standard  and  user-defined  types;  all  such  variables  have 
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an  instance  on  each  of  the  processors  in  the  array.  Each  processor  has  its  own 
memory  so  that  the  system  is  a distributed  memory  machine.  Each  processor  is 
connected  to  its  four  nearest  neighbors  (in  the  North,  East,  South  and  West  direc- 
tions) through  the  Xnet  while  more  distant  communication  is  achieved  through  the 
router  hardware  and  software. 

6.2.  SLI  Algorithm  Modification.  The  underlying  representation  of  SLI  vari- 
ables in  this  implementation  is  essentially  similar  to  that  of  the  Turbo  Pascal  im- 
plementation described  in  §4.  The  algorithms  used  are  modifications  of  those  of 
§2  and  we  highlight  only  the  variations  here.  The  principal  modification  is  for  the 
basic  SLI  addition  and  subtraction  algorithm.  The  b-  and  /3-sequences  are  replaced 
by  making  further  use  of  the  a-sequence  in  a way  which  makes  the  parallelism  of 
the  algorithm  more  evident. 

Consider  again  the  basic  SLI  arithmetic  operation  which  we  can  describe  as 
finding  z and  its  signs  such  that 

(38)  ± ^Z  = X±Y  = ± 

where  we  shall  assume  for  simplicity  that  X >Y  > 0 so  that  (38)  becomes 

<^(z)±i  = Z = X±Y  = ± 

The  modified  algorithm  reduces  to 

Algorithm:  Modified  SLI  Addition/Subtraction  Algorithm 

Input  SLI  representations  <f){xy^ , 4>{yy^  oi  X >Y  >0. 

Compute  o-sequence:  aj  = l/(f){x  — j), 
a-sequence:  Oj  — l/<^(y  — j), 
if  rx  = ry  — +1  then  co  = 1 ± ao/oio, 
if  Tx  = —ry  — -l-l  then  cq  = 1 ± oocco, 
if  rx  = ry  = —1  then  cq— 1 = 1 ± qo/oo- 
Complete  the  algorithm  exactly  as  described  in  §2. 

Output  SLI  representation  <f>{zy^  of  Z > 0. 

Once  this  modification  is  incorporated,  the  completion  of  the  algorithm  is  sim- 
plified and  the  corresponding  adjustments  to  the  multiplication,  division  and  other 
operation  algorithms  are  easily  obtained. 

One  of  the  important  aspects  to  stress  here  is  the  great  advantage  that  this 
yields  for  a SIMD  parallel  algorithm.  For  example,  extended  summation  where 
the  largest  operand  Xq  > 1 now  just  requires  the  (simultaneous)  computation  of 
the  o-sequences  for  each  operand  followed  by  the  extended  fixed-point  summation 
Co  = 1 + SiOo  (ai)”’^’  where  Si  is  the  sign  and  ri  denotes  the  reciprocation  sign 
of  Xi,  and  Oi  = l/^{xi).  (The  use  of  (ai)~^’  is  purely  a notational  convenience;  it 
does  not  imply  that  this  is  an  appropriate  computational  procedure.) 

Clearly,  the  parallel  array  can  be  used  to  implement  the  simultaneous  calculation 
of  all  these  a-sequences.  The  extended  summation  can  be  implemented  using  the 
usual  recursive  doubling  algorithm  which  is  already  available  in  MPL  as  reduceAddz 
where  the  x is  used  to  denote  the  appropriate  variable  type  for  the  result.  There 
are  several  useful  reduction  algorithms  built  into  the  MPL  language,  including 


PARALLEL  AND  SERIAL  IMPLEMENTATIONS  OF  SLI  ARITHMETIC 


21 


reduceMsix  which  will  be  of  value  in  identifying  the  maximal  element  in  the  sum  at 
the  beginning  of  this  extended  algorithm. 

MPL  also  includes  a 64-bit  integer  type  long  long  which  can  be  utilized  for 
this  extended  sum  to  avoid  any  complication  in  the  algorithm  to  cope  with  the 
possibility  that  cq  requires  several  bits  for  its  integer  part.  Indeed,  the  underlying 
arithmetic  of  this  implementation  uses  another  aspect  of  the  Computer  Arithmetic 
laboratory,  fixed-point  fractions  of  varying  lengths.  These  are  all  embedded  into 
variables  of  type  long  long.  The  lengths  used  are  measured  in  hexadecimal  digits 
(called  “nibbles”)  because  these  are  the  natural  units  for  the  4-bit  processors  used 
by  the  MasPar  architecture. 

All  the  other  extended  operations  described  previously  can  be  similarly  efficiently 
implemented  using  the  parallel  array. 

7.  Future  Developments 

In  this  paper,  we  have  described  the  current  software  implementations  of  LI  and 
SLI  arithmetic.  The  most  recent  of  these,  the  parallel  implementation  in  MPL  on 
the  MasPar  MP-1  system,  should  also  provide  a good  first  step  towards  an  eventual 
hardware  implementation.  For  SLI  as  for  other  alternative  computer  arithmetic 
systems,  the  biggest  obstacle  is  the  transition  from  interesting  mathematical  idea 
to  practical  hardware.  In  [28]  and  [33],  some  possible  hardware  algorithms  were 
explored.  The  next  developments  in  the  MPL  implementation  will  involve  further 
numerical  experimentation  in  a variety  of  applications  and  further  development  of 
potential  hardware  designs. 

One  of  the  advantages  of  this  massively  parallel  array  is  that  it  makes  it  possible 
to  simulate  the  components  of  a hardware  algorithm,  and  to  experiment  with  the 
details  of  this  algorithm  without  the  need  to  build  chips  until  extensive  testing  has 
been  conducted.  Testing  of  the  underlying  hardware  algorithm  can  be  performed 
by  simulating  CORDIC  units  for  special  forms  of  the  exponential  and  natural  log- 
arithm functions.  Similarly,  the  use  of  carry-save-adder  trees,  look-up  tables  and 
other  components  can  be  evaluated  through  simulation.  Such  a simulated  hardware 
implementation  will  allow  further  experimentation  on  the  working  precisions  that 
are  needed  in  order  to  deliver  the  required  accuracy  in  final  results,  extending  the 
testing  performed  in  [31].  By  restricting  the  active  set  of  processors,  speed-area 
trade-offs  can  also  be  assessed.  All  of  these  and  other  implementation  details  will 
be  the  subject  of  continued  research  in  this  area. 
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